Multi-scale retrieval in MEI: an English-Chinese translingual speech retrieval system
نویسندگان
چکیده
This paper presents a multi-scale retrieval approach in MEI (Mandarin-English Information), an English-Chinese cross-lingual spoken document retrieval (CL-SDR) system. It accepts an entire English news story (from newspaper text) as the input query, and automatically retrieves "relevant" Mandarin news stories (from broadcast audio). This allows the user to search for personally relevant content across the language and media barriers a crosslingual and cross-media retrieval task. MEI advocates a multi-scale paradigm for the retrieval task. Multiscale refers to the use of both words and subwords (Chinese characters and syllables) for retrieval. Words offer lexical knowledge to enhance precision, and subwords can potentially alleviate some prevailing problems in CL-SDR, e.g. open vocabularies in translation and recognition, out-of-vocabulary words in audio indexing, and ambiguities in Chinese homophones and word tokenizaiton. We present techniques for word-subword fusion, which improved retrieval performance in our experiments with the Topic Detection and Tracking collection.
منابع مشابه
Multi-scale-audio indexing for translingual spoken document retrieval
MEI (Mandarin-English Information) is an English-Chinese crosslingual spoken document retrieval (CL-SDR) system developed during the Johns Hopkins University Summer Workshop 2000. We integrate speech recognition, machine translation, and information retrieval technologies to perform CL-SDR. MEI advocates a multi-scale paradigm, where both Chinese words and subwords (characters and syllables) ar...
متن کاملMandarin-English Information (MEI): investigating translingual speech retrieval
This paper describes theMandarin–English Information (MEI) project, wherewe investigated the problemof cross-language spoken document retrieval (CL-SDR), and developed one of the first English–Chinese CL-SDR systems.Our systemaccepts an entireEnglish news story (text) asquery, and retrieves relevantChinese broadcast news stories (audio) from the document collection.Hence, this is a cross-langua...
متن کاملMandarin-English Information (MEI)
Mandarin-English Information (MEI) is one of the four projects selected for the Johns Hopkins University Summer Workshop 2000. We plan to develop technologies for using written queries to search spoken documents (cross-media) between English and Mandarin Chinese (cross-language). Our research focus is on the integration of speech recognition and machine translation technologies in the context o...
متن کاملCross Language Information Retrieval for Digital Museums
The trend toward information globalization has brought new challenges for digital libraries. On the one hand, it is often necessary for a digital library to share its valuable resources with users of different languages. On the other hand, it is also necessary for a DL user to utilize knowledge presented in a foreign language. This paper deals with the translingual issue on the design of a digi...
متن کاملGenerating Phonetic Cognates to Handle Named Entities in English-Chinese Cross-Language Spoken Document Retrieval
We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Our retrieval system integrates machine translation, speech recognition and information retrieval technologies. An English news story forms a textual query that is automatically translated into Chinese words, which are mapped into Mandarin syllable...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001